Ranking Documents in Thesaurus-Based Boolean Retrieval Systems

نویسندگان

  • Joon Ho Lee
  • Myoung-Ho Kim
  • Yoon-Joon Lee
چکیده

In this paper we investigate document ranking methods in thesaurus-based boolean retrieval systems, and propose a new thesaurus-based ranking algorithm called the Extended Relevance (E-Relevance) algorithm. The E-Relevance algorithm integrates the extended boolean model and the thesaurus-based relevance algorithm. Since the E-Relevance algorithm has all the desirable properties of the extended boolean model, it avoids the various problems of previous thesaurus-based ranking algorithms. The E-Relevance algorithm also ranks documents effectively by using term dependence information from the thesaurus. We have shown through performance comparison that the proposed algorithm achieves higher retrieval effectiveness than the others proposed earlier.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Norbert Fuhr Information Retrieval Methods for Literary Texts

Information retrieval focuses on content-based searching in text documents. For this purpose, first text content must be represented, by using a representation language (like thesauri or classification schemes) or by performing free-text search. The latter approach uses either string-based or computer-linguistic methods (stemming, dictionary lookup, syntax analysis). For retrieval, weighting an...

متن کامل

Semantic-based Medical Records Retrieval via Medical-context Aware Query Expansion and Ranking

Efficient retrieval of medical records involves contextual understanding of both the query and the records contents. This will enhance the searching effectiveness beyond merely keyword matching and is assisted by analyzing its semantics notion such as by the utilization of the MeSH thesaurus. The query is annotated and expanded by information from the deep medical contextual understanding. This...

متن کامل

Partial Boolean Algebras as Models for Thesaurus Integration

A model of a collection of documents based on partial Boolean algebras is presented. This model has been considered while analysing a problem of integration of thesauri. Some properties of partial Boolean algebras are exploited in defining theoretical tools for information retrieval associated to this model. Such tools are a logical language representing queries to the system and a browsing mec...

متن کامل

Document Ranking Method for High Precision Rate

Many information retrieval(IR) systems retrieve relevant documents based on exact matching of keywords between a query and documents. This method degrades precision rate. In order to solve the problem, we collected semantically related words and assigned semantic relationships used in general thesaurus and a special relationship called keyfact term(FT) manually. In addition to the semantic know...

متن کامل

Fujitsu Laboratories Trec7 Report 2 System Description 2.1 Overall 2.2 the Search System Tera

In our rst participation in TREC, our focus was on improving the basic ranking systems and applying text clustering techniques for query expansion. We tested a variety of techiniques including reference measures, passage retrieval, and data fusion for the basic ranking systems. Some techiniques were used in the o cial run, others were not used because of time limitations. We applied the text cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 1994